Achieving Domain Specificity in SMT without Overt Siloing

نویسندگان

  • William D. Lewis
  • Chris Wendt
  • David Bullock
چکیده

We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments on Domain Adaptation for English--Hindi SMT

Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which ...

متن کامل

Functional Siloing? Towards a Practical Understanding of Operational Boundaries Using Critical Systems Heuristics

The paper discusses the application of Critical Systems Heuristics to the problem of functional siloing. Functional siloing refers to a situation in which the functional areas of an organisation become overly focused on local performance measures to the detriment of the organisation as a whole. The authors liken the organisational fragmentation to Ulrich’s description of dysfunctional social pl...

متن کامل

Grounding Imperatives to Actions is Not Enough: A Challenge for Grounded NLU for Robots from Human-Human Data

We present a proposal for a Natural Language Understanding method for simple pick-and-place robots which maps utterances to different levels in an action hierarchy. The hierarchy is a graph containing both lower-level action and higher-level goal levels. This attempts to overcome the surprising lack of overt imperative verb forms in natural task-oriented dialogue, which we show to be the case s...

متن کامل

Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache

We report results from a domain adaptation task for statistical machine translation (SMT) using cache-based adaptive language and translation models. We apply an exponential decay factor and integrate the cache models in a standard phrasebased SMT decoder. Without the need for any domain-specific resources we obtain a 2.6% relative improvement on average in BLEU scores using our dynamic adaptat...

متن کامل

Enabling Domain Experts to Model and Execute Tasks in Flexible Human-Robot Teams

Recent advances in safe human-robot coexistence make collaboration of humans and robots in achieving common goals feasible. We propose a concept that treats human and robot agents as equal partners in executing a task specified by a shared task model. Equality between agents offers high flexibility, as e.g. the team composition may change arbitrarily without interrupting the working progress. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010